Modeling and Quality of Masked Microdata
نویسنده
چکیده
Statistical organizations collect data via survey forms and other methods. The microdata are valuable for modeling and analysis. To produce a public-use file, the organizations mask the data in a manner that may prevent re-identification of data associated with individual entities. The public-use microdata may allow one or two sets of analyses that approximately reproduce analyses that could be performed on the original microdata. This paper describes a general method of creating models of data that is related to methods of creating appropriate aggregates of data that are needed for sufficient statistics in general classes of models (Moore and Lee 1998, DuMouchel et al. 2000, Owen 2003). If the aggregates can be approximately reproduced, then the masked microdata may allow one or more analyses that correspond to analyses on the original, non-public microdata. It will typically not yield data suitable for general analyses.
منابع مشابه
Global Measures of Data Utility for Microdata Masked for Disclosure Limitation
When releasing microdata to the public, data disseminators typically alter the original data to protect the confidentiality of database subjects’ identities and sensitive attributes. However, such alteration negatively impacts the utility (quality) of the released data. In this paper, we present quantitative measures of data utility for masked microdata, with the aim of improving disseminators’...
متن کاملAutomatic Generation of Masked Microdata
Disclosure Control is the discipline concerned with the modification of data containing confidential information about individual entities, such as persons, households, businesses, etc. in order to prevent third parties working with these data from recognizing entities in the data and thereby disclosing information about these entities. In very broad terms, disclosure risk is the risk that a gi...
متن کاملReverse Mapping to Preserve the Marginal Distributions of Attributes in Masked Microdata
In this paper we describe a new procedure that is capable of ensuring that the marginal distributions of attributes in microdata masked with a masking mechanism end up being the same as the marginal distributions of attributes in the original data. We illustrate the application of the new procedure using several commonly used masking mechanisms.
متن کاملP-Sensitive K-Anonymity with Generalization Constraints
Numerous privacy models based on the k‐anonymity property and extending the k‐anonymity model have been introduced in the last few years in data privacy re‐ search: l‐diversity, p‐sensitive k‐anonymity, (α, k) – anonymity, t‐closeness, etc. While differing in their methods and quality of their results, they all focus first on masking the data, and then protecting the quality of the data as a wh...
متن کاملControlled shuffling, statistical confidentiality and microdata utility: a successful experiment with a 10% household sample of the 2011 population census of Ireland for the IPUMS-International database
IPUMS-International disseminates more than two hundred-fifty integrated, confidentialized census microdata samples to thousands of researchers world-wide at no cost. The number of samples is increasing at the rate of several dozen per year, as quickly as the task of integrating metadata and microdata is completed. Protecting the statistical confidentiality and privacy of individuals represented...
متن کامل